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Abstract 

We consider the following problem arising from the study of hu- 
man problem solving: Let G be a vertex-weighted graph with marked 
"in" and "out" vertices. Suppose a random walker begins at the in- 
vertex, steps to neighbors of vertices with probability proportional to 
their weights, and stops upon reaching the out-vertex. Could one de- 
duce the weights from the paths that many such walkers take? We 
analyze an iterative numerical solution to this reconstruction prob- 
lem, in particular, given the empirical mean occupation times of the 
walkers. In the process, a result concerning the differentiation of a 
matrix pseudoinverse is given, which may be of independent interest. 
We then consider the existence of a choice of weights for the given 
occupation times, formulating a natural conjecture to the effect that 
- barring obvious obstructions - a solution always exists. It is shown 
that the conjecture holds for a class of graphs that includes all trees 
and complete graphs. Several open problems are discussed. 

1 Introduction 



Single-agent search problems are commonly modeled as a graph G, with an 
edge from x G V{G) to y G V(G) (i.e., x ~ y) if it is possible to move from 
state x to state y. We will assume throughout that such "moves" x — > y are 
reversible, so that G is an undirected graph. One particular vertex v out is 
the "finish" and another vertex v in is the "start." The former is intended to 
model the solution of the problem being considered, and the latter the initial 
state of the solver. 
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Typical examples of such single-agent search problems include: 

1. Vertices are states of a Rubik's Cube or a 15-puzzle, with an edge 
between two vertices if it is possible to transform one into the other by 
a standard move. Here v in is the starting state (perhaps the result of 
a random walk in G) and v out is the solved puzzle. 

2. Vertices are web pages, with edges corresponding to hyperlinks. In this 
case, v in may be a company homepage, and v out a page where purchases 
are made (the "check-out"). 

3. Vertices are the positions of a chess board, edges correspond to legal 
moves by one player (perhaps a computer), v in is the initial position 
given by a chess puzzle, and v out is the set of all winning configurations 
(checkmates, captures, etc.). 

4. The vertices are a grid of points in a mouse maze, with edges corre- 
sponding to feasible moves (i.e., missing walls); v; n is the cage door and 
v out is the cheese. 

In many such examples, a researcher has access to the state of the solver, 
but not to their reasoning process (their "policy" to use machine learning 
parlance). The amount of time a subject takes to find the solution state 
(the "latency") can serve as a useful proxy for their knowledge level, but this 
single number is a somewhat crude measurement. One might strive to learn 
in addition the value attributed by the solver to intermediate states, i.e., 
the solver's "value function." Such detailed profiles of preferences could aide 
in, for example, improving customer service, evaluating individual expertise, 
estimating how well a lab animal has learned a task, tuning a software game- 
playing engine, or identifying gaps in students' knowledge. However, the 
solver - human, lab animal, machine - may be long gone, may not have 
conscious knowledge of this information, may be secretive, or may not be 
able to express their thoughts in a human- readable format. Nonetheless, by 
studying the path that many instances of the solver take, one could hope 
to reconstruct such valuational ascriptions without the involvement of the 
solvers. This strategy is akin to using the density of oil stains in a parking 
lot to see which spots are most popular or classifying historical road use by 
the depth of wheel-ruts. 
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We model the solution process as a random walk on the graph G, starting 
at v in and ending at v out . Vertex weights specify the proportional probabili- 
ties of moves, and encode the aforementioned value function. A novice solver 
is presumed to follow a uniform random walk; that is, the transition proba- 
bility to go from a state to one of its neighbors is the same for each neighbor. 
The expert solver follows a more direct route from start to finish, as they are 
inclined to move closer to the solution state with each move. 

In the next section, we describe our model in greater detail and relate the 
vertex weights to empirical mean occupation times. The following section 
relates an iterative algorithm for the numerical solution of the problem of 
determining the weights from occupation times. The analysis requires differ- 
entiation of a matrix pseudo inverse, something which may have independent 
interest. Next, we discuss the matter of solution existence: When is it pos- 
sible in principle to reconstruct the vertex weights? We formulate a natural 
conjecture and prove that it holds for a class of graphs that includes all trees 
and complete graphs. The final section discusses several open problems that 
have arisen in this context. 



2 The Model 

For each pair of vertices x, y 6 V, we denote by d(x, y) the graphical distance 
between x and y, i.e., the length of the shortest path that begins at x and 
ends at y. N(x) denotes the "neighborhood" of x, i.e., the set of all vertices 
adjacent to v. The quantity deg(t> ), the "degree" of v, refers to the number 
of edges incident to v G V. 

Let p : V — > K-° be nonincreasing in distance from v out , i.e., d(x, v out ) > 
d(z, v out ) =>■ p(x) < p(z) for each x, z G V. Then we define P(x,y), the 
probability that the solver transitions from state x to state y by 



J2zeN(x) P( z ) 

We also use the notation P^'(x,y) to mean the probability that a walker 
starting from x arrives at y after exactly t steps. Such a distribution corre- 
sponds precisely to a reversible Markov chain, starting from v in and halted 

a t v out- 

Given such a group of solvers, we have an empirical mean "occupation 
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time" for each vertex V, given by 



k=i 



where Tk(v) is the number of visits to site v that subject k makes before 
arriving at v out . It would be useful to understand how p relates to r. 

Suppose that we perform a random walk on G according to the dis- 
tribution P arising from p as above, starting at v in and stopping at the 
hitting time of v out . Note that the random walk arising from P is also 
the random walk one gets by taking edge weights wt(x,y) = p(x)p(y), 
since the ratio of weights of neighbors of a point is the same. If we define 
p(x) = J2y^ x w ^( x ^y) = P( x )J2 y ~ x P(y)i then the corresponding stationary 
distribution at the point x is p(x)/ vol(G), where 

vol(G) = = 2 wt ^)- 

y&V {y,z}£E(G) 

Applying [TJ, Chapter 2, Lemma 9, we have 
E(t(x)) = ■ (E(v in v out ) + E(v out x) - E(v in -> x)), 

VOl^Gr J 

where (x — * y) is the time that a walk begun at x hits y for the first time. 
(We adopt the convention that (x — > x) = 0.) Furthermore, we have ([3], 
Theorem 8) 

p{y) p[x) 
where G(x, y) is the discrete Green's function for G with weights /?(•), whence 

E(t(x)) = p(x) ■ ( G ( Vout > V ° ut ) _ g ( V m' V °uQ _ ^(Vout, X) + G(Vin, x) 
V P( v out) P( v in) /5(Vout) P(v in ) 

The matrix G of values G(x, y) is given by ([3], (16)) 

n-l 

g = t^gt- 1 ' 2 = t 1 ' 2 {K^*A) t~ 1/2 , (1) 

i=l 
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where T = diag(pi, . . . ,p n ), = A < Ai < ■ ■ • < A n _i are the eigenvalues of 
the normalized Laplacian L and 4>q, . . . , n _i are the corresponding eigenvec- 
tors. The normalized Laplacian is, in turn, defined to be T~ l l 2 LT~ l l 2 where 
L is the combinatorial Laplacian: 



Ultimately, our objective is to reconstruct the function p from f . There are 
ri — l unknowns that define p(-) (recall that p(v out ) = 1) and n — 1 degrees 
of freedom in f, so that such a reconstruction is reasonable to attempt. A 
maximum-likelihood estimator for p seems out of reach, however, since the 
relationship defining r from p is so complicated. Therefore, we adopt a 
standard simplification: the method of moments. That is, we try to solve 
E(t) = f for p. 

This problem, though simpler, is still analytically intractable. Nonethe- 
less, one can approximate p by iterative numerical methods. Consider the 
following algorithm: 

1. For each solver, track how many times they visit each site v £ V as 
they traverse the graph from v; n to v out . Let the average number of 
visits for each group be f(v). 

2. Without loss of generality, we restrict our attention to the induced 
subgraph G[supp(/)]. 

3. Start with a uniform distribution p : V — > [0, 1], i.e., r(v) = 1. 

4. Apply a steepest-descent strategy to the cost function $ (defined be- 



Meaningful information could be extracted from the resulting pg na i by, for 
example, performing a regression against some notion of distance to v out : 
graphical distance, electrical resistance, etc. 
Define r p = E(r), and let 





otherwise. 



3 Numerical Solution 



low). 



*(p) = \\*-T p \\l = (f-T p y(T-T p ) 



(2) 
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where we are treating functions of V as vectors in M n . Let 



dp(x) 

We may then apply, for example, steepest-descent, guided by the gradient 
vector (A x ) xeV . 

Applying (T5]), we have 

— It fyJle- + -^£-( T - f ) = 2(t - f)'—- 

p dp(x) dp(x) 9 p dp(x) 

To simplify this, first note that dp(y)/dp(x) is p(y) if y ~ x, J2 y ^ x P(y) ^ 
y = x, and otherwise. Furthermore, dvol(G)/dp(x) = 2 ^2 y ~ x p(y)- We 
can then write 

dr p d /p(x)G(v out ,v out ) p(x)G(v in , v out ) 



p(x)G(v out ,x) p(x)G(v in ,x^ 



p(vout) p(v t 



Furthermore, 



<2 p(x)G(a,b)_{ d p(x)\ GM + M d Q{ab) 



dp(x) p(a) \dp(x) p(a) J p(a) dp(x) 

p( a )-rr^P( x ) - P( x )-rr^p{o) J p(a)~ 2 G(a,6) 

ap(xj ap(:rj / 

p^a) ap^xj 

Hence, it remains to compute dG/dp(x). To that end, we have the following 
result. Define the Moore-Penrose pseudoinverse (or just pseudoinverse for 
short) of a real symmetric matrix B of rank n — k to be a matrix A so that if 
Xi, . . . , Xfc are an orthonormal basis for null(-B), then AB = I — Xlj=o x J x j' 
and null (A) = null(S). 

Theorem 1. Suppose that A is the pseudoinverse of the real symmetric ma- 
trix B with rank(i?) = n — k, then 

A' = -{P + AB')A- AP', 

where P = J2jZo x j x i ■ 



Proof. Let Pj = XjX*, let Xo,...,x n „! be orthonormal eigenvectors for B 
(including the null vectors), and let = A = • • ■ = Afc_i < A*. < • ■ ■ A n _i be 
the corresponding eigenvalues. Then, differentiating AB = I — Pj, 

{AB)' = A'B + AB' = -P', 
whence A'B = —P' — AB'. Since A = ^1^=1 ^J 1 ^'' ^ * s s y mme ti'ic. Hence, 

'n-l 



BA= ^XiPjjA 

n-l 

j=k 
n-l 



j=k 

n-l k-1 
j=k 7=0 

Then right-multiplying by A the expression for A'B above, 

A'BA = A'(I-P) = -(P' + AB')A, (3) 
so we may rewrite this as 

A' = -(P 1 + AB')A + A' P. 
On the other hand, AP = Yl'jZo Ax.jXj = 0, so 

A'P = -AP', 
which we may apply to to get 

A' = —{P' + AB')A — AP'. 



□ 



Now, 

ly-l/2 dT grp-l/2 _j_ j.1/2 ^ y-1/2 _ jjrl/2 gy-3/2 ^ 



c/p(x) 2 <ip(x) <ip(x) 2 dp{x) dp{x) 
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1 dT T~ 1 G I T 1/2 dg T" 1/2 - CP" 1 rfT 



2 dp(x) dp(x) 2 dp{xY 

since T a and T" are diagonal, and therefore commute with each other. The 
diagonal of dT/dp(x) has y-coordinate p(y) if x ^ y and x ~ y, ^2 y ^ x p{y) 
if x = y, and otherwise. We may apply Theorem [T] to Q, since £ is the 
pseudoinverse of C Then (abbreviating by the operator d/dp(x) by (•)'), 

Q' = _(P' + - £P' 

where P = 0o0o- in this expression, 

P' = 0o^O + Woi 

and the ?/ coordinate of 0o is \/ / vol(G), whence the y coordinate of (f)' 
is 



- vol(G)- 2 ( vol(G)-^p(y) - ~ P {y)-^r- vol(G) 



dp(x) y vol(G) \ dp(rr) 

Finally, £' has (y, z) entry if y = z, and, if y ^ z, 

dp(x) y/p(y)p( z ) V dp{x) 

- p{y)piz) dh vp(Mz] 



{p{y)p{z)) 1 ( y/p(y)p(z)x(y = x)p(z) 



+ Vp(y)p( z )x(z = x)p{y) 

p{y)p{z) ( , dp{z) dp(y) \ 

2^y)m V {y> dp(x) +P{ } dp{x)) 

where we are denoting the indicator function of an event £ by x{£)- 



4 Solution Existence 

It would be useful to know for certain that, for each f, there does indeed 
exist a set of weig hts p : V -> M >0 which gives rise to the desired expected 
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visitation times. In other words, we wish to show the existence of a p so that 

E{t p {x)) = t{x). 

One could view such cl p clS 8b Method-of-Moments estimator for the weight 
function of V. Note that it is certainly impossible to solve for p if f has 
disconnected support as an induced subgraph of V. Indeed, the set of vertices 
visited by a random walk uj = (v in = v , V\, . . . , i>t-i> vt = v out) is connected. 
Write tr w , the "trace" of u to be the function tx^{-) : V — > Z whose value at 
v is simply the number of occurrences of v in u, i.e., 

tr u (x) = \{j : < j < T, vj = x}\. 

We say that a walk uj = (vj n , ui, . . . ,vt-i, v out ) is "proper" if tr a) (v out ) = 1. 
Recall that, for a function p : V — > K and = we define r p (f) 

to be the expected number of visits to v of a random walk that starts at 
v; n , navigates G according to p, and ends at its first encounter with v out . 
We say that the equation r p = r is "solvable" if there exists a p with all 
positive coordinates so that the equation holds. Note that we may restrict 
our attention to those G so that G' = G \ v out is connected, since any 
component of G' not containing v ul cannot be visited by any proper walk. 
Finally, define Xv to be the characteristic function of the vertex v G V(G) 
and Xe to be Xx + Xy f° r an Y edge e = {x, y} 6 E(G). 
For any choice of p, one can write 

Tp = ^ tr w P ( W ) 

proper u) 

where P(o;) is the probability that the walk uj occurs given the weighting p. 
Therefore, if r p = r is solvable, then r lies in the convex hull of the traces 
of all proper walks tr w . It is not hard to see that r actually lies in ^>q, 
the interior with respect to a minimal containing hyperplane of the convex 
hull of the vectors tr w G M n . This minimal containing hyperplane 7i is not 
full-dimensional, as the next lemma describes. 

Lemma 2. dim(7Y) = n — 1 if G is not bipartite and n — 2 if G is bipartite. 
Proof. First of all, dim(7Y) < n — 1, since tr £J (v ou t) = 1- We may write 
H = tr w + span({tr a; / - tr w } proper J) 
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for any proper walk uj. It therefore suffices to determine the dimension of 
S = span({tr (J — tr^/}^), for u = (v , . . . , vt) some fixed proper walk which 
passes through every edge not incident to v out . Such a walk exists, since 
G' — G\ v out is connected. Given an edge e G E(G'), we define the walk uj e 
by 

U e = (V , V 1 ,...,V t , V t+1 , V t , V t+1 ,V t+2 , Ut-1, «t), 

where t is the least index so that {v t ,v t+ i} = e. Then 

f r aj e t r £j Xi)t ~l~ Xvt+i Xe- 

Therefore, if v is adjacent to v in in G', then x v . n + Xv £ «5. If f is adjacent 
to a vertex id which is adjacent to Vj n , then 

(Xv in + Xv) - (Xv + Xv,) = Xv in -Xw&S. 

Proceeding inductively, we see that, if there is a path of length I from v; n to 
v in G', then 

Xv in - G 5. (4) 

Since the functions Xd are linearly independent for w G G', this shows imme- 
diately that dim (5) > n — 2. 

Suppose that G is not bipartite. Since G' is connected, there are two 
proper paths of length l\ and £2, where t\ and £2 differ in parity, from v in to 
v ou t- Therefore, 

\[(x Vin - (-i) £l x Vo J + (Xv in - (-l)' 2 Xv out )] = Xv in G S. 

Subtracting this quantity from (j4j), we have that Xv £ «5 for all t> G V(G'), 
so dim(7i) = n — 1. On the other hand, if G is bipartite, then there is 
a function c : V(Gr) — > { — 1, 1} inducing the bipartition. For any proper 
walk uj' = (wo, . . . , Wt'), c ( w j) alternates as j goes from to t. Hence, 
c • tr w / G { — 1,0, 1} (where we think of both factors in this dot product as 
vectors in ~R n ) has the same value for any proper walk u/. We may conclude 
that 

c • (tr w - tT u >) = 0, 

so that iS _L span{c, Xvout}- Since this span is clearly two-dimensional, 
dim(W) = dim(5) = n - 2. □ 
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Conjecture 1. The equation 



r p = r 



is solvable if and only if r lies in the relative interior of the convex hull of 
the tr w G M. n , for all proper walks to. 

Necessity is immediate, by considering the set of all proper walks weighted 
by their probabilities. We begin our attack on sufficiency modestly. Define 

«/ \ p( w ) \^ ( \ 

P H : = -j-^ = VpW. 



z~w 



and write e^ for the elementary vector with nonzero coordinate at w G V, 
i.e., the indicator function of w. 

Theorem 3. Fix r G M. v ^. Let a > and suppose that r p = r — ae v is 
solvable. Let G' be the graph obtained from G by attaching a vertex v' of 
degree 1 to v G V , and let r' : G' — »• R be defined by 



r'(w) 



r(w) if w G G 
a if w = v' 



Then t p > = r' is solvable (whence r' G ty(G')). 
Proof. By hypothesis, we can solve 

/ f r(w) if w G G \ v 

Tp(w) = r - ae v = < } / ., 
^ [ r(t> ) — a it w = -y 

for p. Define p'\a = p\c an d 

p*(v)a 



P'(V) 



riv) — a 



Call a visit to v "initial" if it is not immediately preceded by a visit to v' . 
Note that, at every visit to v of a random walk according to p', the probability 
of visiting v' on the next step is p ,^,y^_ p \^ ■ Hence, the expected number of 
visits to v' that occur with each initial visit to v is 



a :- 



2-^ \ n'(,A 



-\p'(v') + p*(v)J p'(v')+p*(v) 



k>l yr y ' r \ ,/ r \ , r v / p>(,/) + p*(„) 
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p\v') p'(y')+p*(v) 
p'(v') + p*(v) p*(v) 
p*(v)a/(r(v) — a) a 

p*{y) v{v) — a 

Now, if we excise from the walks according to p' the steps immediately fol- 
lowing each initial visit to v up until (but not including) the next time that 
the walk is neither at v nor v', the distribution of the resulting walks pro- 
ceeds according to p on G. It is easy to see then that there are an expected 
r(v) — a number of initial visits to v in a walk according p', which implies 
that 

t p '(v') = (r(v) — a) ■ a = a. 

On the other hand, since each visit to v ' is immediately followed by a visit 
to v, the expected number of visits to v under p' is simply 

V(» = 7>0)(1 + a) = 00) - a) 

Finally, since projecting the //-walk onto G via excision (as described above) 
yields a p-walk, 

T p '{w) = t p (w) = r(w) 
for each w 6 G \ {v, v'}. This in turn implies that 7y = r' is solvable. □ 

Corollary 4. Suppose that t p = r is solvable for every r e ^{G). Let G' be 

the graph obtained from G by attaching a vertex v' of degree 1 to v e V, and 
let r' : G' — > M be defined by 

r» = ( r(w) % WeG f . 
[ a if w = v 

If V G \&(G'), then t p > = r' is solvable. 

Proof. By the preceding theorem, we need only show that r' G \1/(G') implies 
r — ae v e ^{G). Therefore, suppose that r' G ty(G'), so we may write 

r' = A^tr^ 

where 
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Vrfv) -a/ 



For each G'-walk lj, let u be the walk obtained from u by the excision process 
described in the preceding proof. Note that 



trs, = tr w | v(G ) - tr^(u')e„. 



Hence, 




5^ K>teu\v(G) - ^2 *w tr «(u') e 



■v 



r'|v(G) - r'(w')e, 



r — ae. 



To see that the point r — ae v is actually in the interior of the convex 
hull, simply note that the open mapping theorem implies that the map 
(xi, . . . , x n ) i — ► (xi, . . . , x n -2, x n -i — xi) (and any map obtained by permut- 
ing coordinates) from the minimal containing hyperplane of to its image 



Theorem 5. Assume that G has two vertices v,w G G\ {a, b} such that 
N(v) = N(w). Further suppose that r p = r' is solvable for every r' e \I/(G') ; 
where G' = G — w. If t E ^g, then r p = r is solvable. 



Since r' e (G 1 ), we can write r' = J2u, \,tr w . It is easy to see that, if we 
write to' for the walk obtained from uo by replacing each occurrence of w with 
v, then 



Since v and w have identical neighborhoods, io' is a bona fide G-walk for each 
G"-walk uj. The open mapping theorem implies that the map (x\, . . . , x n ) i— > 
(xi, . . . , x n _2, x n _i + xi) (and any map obtained by permuting coordinates) 
from the minimal containing hyperplane of to its image preserves open 
sets. Hence, r e *&(G), and, by hypothesis, we can solve r p — r'. 



preserves open sets. The conclusion follows immediately. 



□ 



Proof. Define r' : 



V(G') -> R to be 
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Now, for a G [0, 1], let p a agree with p on G \ {v, w}, p a (v) = ap(v), and 
Pa{ w ) — 1 — ap(v). A p^-walk visits the set {v,w} an expected r(v) + r(w) 
number of times, with each visit going to v with probability a and going to 
w with probability 1 — a. Therefore, the expected number of visits to v is 
(r(v) + r(w))at and the expected number of visits to w is (r(v) + r(tw))(l — a). 
We can set a = r(v)/(r(v) + r(w)) so that r Pa = r. □ 

Dealing with the case of a path would be useful at this point. In that 
case, we write the vertices of G in order: v\ = v out ,w 2 , . . . ,v n _i,v n = v in . 
Write M + for the nonnegative reals and R ++ for the positive reals. 

Proposition 6. Suppose G is a path. The set ^>q is precisely the set of 
vectors of the form 1 + J^j=2 a j^j> w here fj = ej + e J+1 and aj > for each 
2 < j < n- 1. 

Proof. We actually show that the topological closure ^ of ^>g is °f t ne form 
is easy to see that the conclusion 1 + Y^=2 = 
^>G then follows, since non-boundary points x can be perturbed by some 
YTjZz e jfj f° r e j > without leaving the set, implying that the projection of 
x — 1 onto each is nonzero. 

Let rj (j, k) denote the walk from v in to v out of the form 

(v n , V n -!, V j+2 , V j+ i, Vj, V j+1 , Vj, Vj-i, ...,V 2 , Vi), 

V «, ' 

k times 

that is, a direct path with k "steps back" at j added, k > 2 and 2 < j < n— 1. 
(Write 7] for the path with no steps backwards.) Clearly, tr^y^) = kfj + 1. 
By taking convex combinations of tr^fc) an d tr^ for sufficiently large k, one 
can construct any afj + 1 with a > 0. Then, by taking convex combinations 
of the resulting vectors, the inclusion 1 + ^"=2 ^ ^ fo nows - 

For the opposite inclusion, it suffices to show that tr w G 1 + Y^j=2 ^ + Ij 
for each proper walk u. We show this inductively: if u — rj, the statement 
evidently holds. Hence, assume that, for some t > 0, c<j(t + 2) = cu(t) = v j 
and u)(t + 1) = Vj + i. Every proper walk other than rj admits such a t since, 
for example, we may take Vj —> Vj + i — > Vj to be the last step backwards. 
Then 

tr^ = tv u i + f 7 
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where to' is the proper walk uj with steps t + 1 and t + 2 removed. Clearly, 
by iterating this argument, we arrive at a representation of the form 



We need the following lemma, which allows us to compute expected oc- 
cupation time vectors as eigenvectors of a certain matrix. 

Lemma 7. Let n = \V(G)\, where G is a weighted graph with wt{y) = (3 V 
and distinguished vertices v in , v oui . There is a unique nonnegative vector 
r G R V{G) so that r Vout = I, ||r||i > I, and Mr = r ; where M G R nxn is 
defined by 



Furthermore, r v is the expected number of visits in a proper random walk on 



Proof. Let V be the weighted digraph whose adjacency matrix is M. Then Y 
consists of G with each edge incident to v out removed, plus a single directed 
edge from v out to v in . In particular, M n has all positive entries, except 
for the nondiagonal elements of its first column (which are 0). Suppose 



r Vout = r ; out = 1, ||r|| > 1, ||r'|| > 1, Mr = r, Mr' = r', but r ^ r'. Then 



M n (r — r') = r — r' as well. Note that there exists some strictly positive vector 
u so that u-r = u-r' = 1: simply choose a positive vector orthogonal tor-r' 
and scale it so that its dot product with r is 1. (The vector r — r' has positive 
and negative entries since r and r' each have at least two positive entries, 
and they are not the same vector.) If we replace the first row of M n with the 
vector u, obtaining a new matrix M', then M'(r — r') = r — r'. The Perron- 
Frobenius Theorem implies that r has all positive entries. However, its first 
coordinate is 0, a contradiction unless r = r', which is also a contradiction. 
Hence, the solution to Mr = r is unique. 

We therefore need only show that the vector r of expected number of 
visits satisfies Mr = r, since r Vout = 1 and r has additional nonzero entries. 




with (tj > 0. 



□ 




G with weights {/3 v } ve v{G)- 
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It is clear that (Mr] 



Vout 



1 = r Vout . Since r w is the (3 V / Y,u~w &r weighted 



sum of r v for v ~ w, where v ^ v; n , v out , the claim also holds for these v's. 
As for r v . n , the expected number of visits to v- m is the weighted sum of its 
neighbors' expected number of visits, plus 1, since the first visit to r v . n is not 
preceded by a visit to any other vertex. However, this extra "1" comes from 
the (v , w) = (v in ,v out ) term in M, because r v =1. □ 



Theorem 8. For G a path, r p = r is solvable iffrE ^g- 



Proof. By Proposition [61 we may assume that r = l + ^ n " 
for all j, 2 < j < n — 1. Let p(^i) = 1, ^(^2) = 1, and, for j > 2 



:2 ctjij with aj > 



p(vj) 



1 - r L(j-3)/2j 
llfc=0 



nL0'-4)/2j n , n 



where we interpret an empty product as 1. Let (3j = p(vj). To see that 
r p = r, we need to show that Mr = r, where M is the matrix given by 



1 











As 







Pl+03 







ft 

132+^4 







As 



03+05 









04 



04+06 







1 








0n- 










0n- 



4+0n-2 






0n 



-3+0n 










0n-2+0n ^ 

1 

0n 








0n- 



n -2+0n 







This will suffice to provide the result, since by Lemma [71 r is the unique 
solution to Mr = r with r(b) = 1. 
Recall that 

1 

1 + a 2 
1 + a 2 + «3 

1 + + «n-l 

1 + «n-l 
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Hence, the first coordinate of Mr is 1 = r(l). The second coordinate of Mr 



is 



-r(3) = 1 (1 + a 2 + a 3 ) 



1 + Oin , 

= — ; (1 + a 2 + a 3 ) = 1 + a 2 = r(2). 

Let pj = /3j/(f3j-2 + (3j) and qj = fy/ {(3j + (3 j+2 ) = 1 - Pj +2 - Then 



Pj = 



Pj-2 + 



n L(j-5)/2j TTL(i-4)/2J n , 

1 lfc=0 a j-2fc-3 llfe=0 l 1 +"j-2fc-2, 



llfc=0 I 1 + a 3-2k-4) llfc=o «j-2fc-l 

Q!j-i / 1 + «j_ 2 + ay_i ' 



and 



1 + "i 

?7 = 1 - Pi+2 



1 + Qtj + %+l 

Then, for 3 < j < n — 2, the j th coordinate of Mr is given by 

- 1) + #r(j + 1) = — ^— (1 + ai _ 3 + o^-i) 

1 + Olj-2 + 

1 + OLa 

(1 + ocj + aij+i) 



1 + OLj + 

= 1 + ttj-x + a, = r(j). 
The (n — l) st coordinate of Mr is 

p n _ir(n - 2) + r(n) = — — (1 + a n _ 3 + a n _ 2 ) + (1 + a n -i) 

1 + a n _ 3 + a n _ 2 

= 1 + a n _ 2 + a n _i = r(n - 1), 
and the n th coordinate of Mr is 

1 + p n r(n - 1) = 1 + — — °zVz± (i + Qn _ 2 + Q , n _ 1 ) 

= 1 + a n _i = r(ra). 

□ 
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Theorem 9. For G = K n a complete graph with vertices V\ = v ou4 , w 2 = 
v in , v 3 ,..., v n , t p = t is solvable iffrE ^g- 

Proof. First we give a description of the solvable r's. In order to simplify 
our calculations, we will assume (without loss of generality) that the weights 
ft, . . . , p n sum to 1. Therefore, define M to be 



1 
1 








•■ 












Ps 


fa 




P2 


p% 


l-/3 3 

•■ 


1 


-Pn-l 
/3 3 


1-Pn 
P3 


l-/3 2 


1 


-Pn-l 


1-Pn 


Pn-i 

Pn 


Pn-l 

l-/3 3 
Pn 






Pn 


Pn-l 

1-Pn 



l-/3 2 


1-/33 


1 


-Pn-l 








so that Mr = r by Lemma Note that the lower-right (n — 1) x (n — 1) 
submatrix of M is simply 



[P 2 ---Pn](J-I) 



(l-ft)" 1 



where J E R^- 1 )*^ 1 ) is the all ones matrix, and / is the identity. We show 
that the following is a solution to Mr = r (and therefore the unique one with 
n = 1): 

r ft(l-ft)/ft if 3^1,2 

Tj = < 1 if j = 1 

I (l + p 2 /Pi)(l-p 2 ) if j = 2. 

It is clear that (Mr)i = 1 = n. If j ^ 1, 2, then 



(Mr), 



ft(l+ft/ft)(l-ft) 
1 -/3a 

ft(ft + ft) 



E- 

i=3 



ft(i-ft) ft 



ft 



ft 



Eftft 

^ x i=3 
J (P 1 + P 2 + (1-P l -P 2 -P j )) 



ft'(l-ft-) 
l-ft 
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It remains to check r 2 : 



(Mr) 2 = l + Yl 



A(i-ft) A 



i=3 



ft 



i-ft 



lx Ni-W (ft+ft)(l-ft) 
" 1 + ft " ft 

Now, if r,j — (3j{l — for each j > 3, then 



1 ± a/1 - 4^-ft 



with ± interpreted to be addition if ft > 1/2 and subtraction otherwise. 
Note that at most one of the ft can exceed 1/2 since Ej=i = 1- 

Suppose for the moment that ft < 1/2 for j > 3. Then we can write 
ft = 1 - Ejy 2 whence 



r 2 



(ft + ft)(l-ft) 

(2 - E 3 - # i, 2 (i - V 1 - ggOXggi + E 3 - # i, 2 (i - y/i - jgjj) 

This expression is defined for ft e (0, minj> 3 (4rj) -1 ]. We will assume 
for convenience that r 3 = maxj> 3 i\,-, so < ft < r^V^- Let itj = 1 — 
a/1 — 4r,ft. Then, when ft — > + , we have 



lim r 2 = lim 

,91^0+ /3i^0+ 



(2-E j¥ i, 2 ^)(2ft + E j¥ i 
4ft 



2 M i 



-w+£«.)£|J+p-I»p+£^: 



where Uj — 1 — a/1 — 4i\,-ft, by L'Hopitals' Rule. Since Wj|^g 1= o = and 



rfft 



2r, 



/9i=0 



1 - 4 rj ft 
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2r 



/3i=0 



we have 



On the other hand, if /?i = (4r 3 ) 1 , then 



r 2 



( 2 - I> - V 1 - r ^ /r3) ( \ + ra + r3 E( x - V 1 - r ^ /r3) J ■ (5) 

j>3 V j>3 J 



Now, if /3j > 1/2 for some j > 2, we may assume without loss of generality 
that j = 3. Note that 

v j = P J (l-P 3 )/p 1 >a(l-a)/p 1 

for all a < 1 — /3j. But < 1 — /3j for all i ^ j, so rj > for alH 7^ 1, 2, j. 
Then, we have again that r 3 = max^r-,-. Letting be as above for j > 3 
and u 3 = 1 + — 4r 3 /?i, we have 

(A+/5 2 )(l-/5 2 ) 



r 2 



(i-E w ^)E jV2 ^ 

(2-E,->s«i)(2A + Ei> 8 «^ 



4/3i 

This expression is again defined for any /?i e (0, (4r3) -1 ]. The above expres- 
sion agrees with (jSJ) when /5i = (4r3) _1 , since then U3 = 0. On the other 
hand, when /5i — > + , 



lim r 2 = lim 

ft— 0+ ft-0^ 



(2-E J >3^)(2A + E J >3^) 
4A 



1 

4 



w + E^Eit + ^-E^+Eil: 

#1,2 #1,2 Hi #1,2 #1,2 HL 



ft=0 



1 ^ <^ n j 

2 ^ Wi 



01=0 
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since Uj = when fi\ = except for u 3 , which is 2. Now, ^ = 2r,- for j > 3, 



but = — 2r 3 . Hence, 



d/3i 

— — I— I OT) C*d 

lim r 2 = r 3 - VV-. 

We may conclude, by the Intermediate Value Theorem, that r > is solvable 
as long as ri = 1 and 



~J2 r 3 < 1-2 < 1 + 2^*> 

j>3 j>2 



We claim that this inequality holds for all elements of ^q. To see the upper 
inequality, consider the fact that each visit (after the first) to v 2 of a proper 
walk is preceded by a visit to some Vj with j > 2. Hence r 2 is at most one 
more than Ylj>2 v i- ^° see ^ ne l° wer inequality, we write it thusly: 



r 3< E 



Again, every visit to v 3 in a proper walk is preceded by a visit to some Vj 
with j ^ 1,3. The inequality, and the theorem, follows. □ 



5 Open problems 

The following are unsolved problems that have arisen in the current study 
and which we would like to see addressed. 

1. Conjectured] For which r is it possible to solve for the weights in the 
equation r p = r? 

2. Is it true that the iterated numerical solution described above always 
yields the correct answer, assuming a solution exists? To put it another 
way, is there a unique local minimizer of \\t p — r||| for a given r? 

3. If more information is available about the routes that random walkers 
take than just the empirical mean occupation times, could one exploit 
this to more efficiently obtain the weights, or to obtain a "better" set 
of weights? 
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4. Suppose some measure of expertise is used after the weights are ob- 
tained. For example, one might ask for the correlation coefficient be- 
tween the weights and the distance function / : V(G) — > N given by 
f(v) = d(v,v on t). How well does this scheme classify novices and ex- 
perts? 

5. How well does the method-of-moments estimator we introduce above 
perform, in terms of bias or mean-squared error, for example? 
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